A Framework for Identifying Textual Redundancy
نویسندگان
چکیده
The task of identifying redundant information in documents that are generated from multiple sources provides a significant challenge for summarization and QA systems. Traditional clustering techniques detect redundancy at the sentential level and do not guarantee the preservation of all information within the document. We discuss an algorithm that generates a novel graph-based representation for a document and then utilizes a set cover approximation algorithm to remove redundant text from it. Our experiments show that this approach offers a significant performance advantage over clustering when evaluated over an annotated dataset.
منابع مشابه
Linguistic Redundancy in Twitter
In the last few years, the interest of the research community in micro-blogs and social media services, such as Twitter, is growing exponentially. Yet, so far not much attention has been paid on a key characteristic of microblogs: the high level of information redundancy. The aim of this paper is to systematically approach this problem by providing an operational definition of redundancy. We ca...
متن کاملIdentifying and Ranking the Important Textual and Paratextual Elements in Fiction Retrieval
Purpose: The purpose of this study is to identify the textual and paratextual elements in retrieving fiction from the readers’ perspective in order to provide the most appropriate access points for the readers and to improve access to fictions based on the readers’ needs. Method: The current research is an applied study in terms of purpose, applying a mixed method that was conducted using the ...
متن کاملTEXTUAL AND INTER-TEXTUAL ANALYSES OF IRANIAN EFL UNDERGRADUATES’ TYPES OF ENGLISH READING TOWARDS DEVELOPING A CAREFUL READING FRAMEWORK
This study investigated textual and inter-textual reading of a group of Iranian EFL undergraduates’ careful English reading types. In this research, Khalifa and Weir’s (2009) reading framework was used to propose a more inclusive aspect of a careful reading framework and the reading construct for instructional and assessment goals. The participants of this study were B.A. students of English Tr...
متن کاملA Micro- and Macro-Level Descriptive-Analytical Study of Translation Criticism in Iran: Are We Moving within a Framework?
The present corpus-driven study addresses the current situation of translation criticisms published in print or online in the Iranian media. A sample of 17 criticisms (roughly 68,000 words altogether) from a variety of valid media outlets was compiled. Having been categorized into those with, and those without an ex- plicit theoretical framework, the criticisms were examined on two levels...
متن کاملFault Tolerant Reversible QCA Design using TMR and Fault Detecting by a Comparator Circuit
Quantum-dot Cellular Automata (QCA) is an emerging and promising technology that provides significant improvements over CMOS. Recently QCA has been advocated as an applicant for implementing reversible circuits. However QCA, like other Nanotechnologies, suffers from a high fault rate. The main purpose of this paper is to develop a fault tolerant model of QCA circuits by redundancy in hardware a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008